Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add-rc-locator-to-partition-excel #3258

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

marctorsoc
Copy link

@marctorsoc marctorsoc commented Jun 20, 2024

As described in #3236, I wanted to add the coordinate RC to the metadata of the returned elements. I'm adding both rc and excel_rc. The former being 0-based, the latter 1-based (as in Excel).

@marctorsoc
Copy link
Author

can I get some feedback here? e.g. @potter-potter @rbiseck3 (sorry for the spam, feel free to share with whoever that could help)

@marctorsoc
Copy link
Author

Hi @potter-potter, I fixed the conflicts. Could you enable CI at least?

@potter-potter
Copy link
Contributor

Hi @potter-potter, I fixed the conflicts. Could you enable CI at least?

hey @marctorsoc I alerted @scanny to this pr earlier this month. you'll have to go through him.

@scanny
Copy link
Collaborator

scanny commented Aug 1, 2024

Hi @marctorsoc I'm not sure this metadata is of general interest for users of the library. Let me ask around and see what folks have to say.

For sure we wouldn't want two fields with the same value offset by 1. A user could easily adjust for themselves if they want cardinal vs. ordinal numbering.

And starting_page_number has a different meaning having to do with adjusting page-numbering, so repurposing it for selective partitioning would change its behavior. That parameter is used across multiple partitioners and it needs to have consistent semantics across all those.

I think your best bet for the moment would be to use your fork if you want these new behaviors.

@marctorsoc
Copy link
Author

Hi @marctorsoc I'm not sure this metadata is of general interest for users of the library. Let me ask around and see what folks have to say.

For sure we wouldn't want two fields with the same value offset by 1. A user could easily adjust for themselves if they want cardinal vs. ordinal numbering.

And starting_page_number has a different meaning having to do with adjusting page-numbering, so repurposing it for selective partitioning would change its behavior. That parameter is used across multiple partitioners and it needs to have consistent semantics across all those.

I think your best bet for the moment would be to use your fork if you want these new behaviors.

Hi @scanny, I had removed the second bit, but did not update the description. I'm fine with removing that.

Regarding the two fields, I think they are useful because one is how dataframes and in general programming works (0-based) vs how Excel works. So when checking what the library gives vs the excel, it's convenient to have the excel formatting. It's not only an offset, but also converting the column to a letter. But yeah, I'm ok with removing that too.

IMHO, providing the coordinates of each partitioned element should be part of the metadata. For any type of document. This is the case for pdfs. But for docx and txt it's the same story as xlsx. It's used internally but not surfaced to the user, who could also be interested.

Well, lmk what the wise council decide 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants